Deep neural network based supervised speech segregation generalizes to novel noises through large-scale training
نویسندگان
چکیده
Deep neural network (DNN) based supervised speech segregation has been successful in improving human speech intelligibility in noise, especially when DNN is trained and tested on the same noise type. A simple and effective way for improving generalization is to train with multiple noises. This letter demonstrates that by training with a large number of different noises, the objective intelligibility results of DNN based supervised speech segregation on novel noises can match or even outperform those on trained noises. This demonstration has an important implication that improving human speech intelligibility in unknown noisy environments is potentially achievable. OSU Dept. of Computer Science and Engineering Technical Report #02, 2015
منابع مشابه
Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises.
Supervised speech segregation has been recently shown to improve human speech intelligibility in noise, when trained and tested on similar noises. However, a major challenge involves the ability to generalize to entirely novel noises. Such generalization would enable hearing aid and cochlear implant users to improve speech intelligibility in unknown noisy environments. This challenge is address...
متن کاملDeep Class-Wise Hashing: Semantics-Preserving Hashing via Class-wise Loss
Deep supervised hashing has emerged as an influential solution to large-scale semantic image retrieval problems in computer vision. In the light of recent progress, convolutional neural network based hashing methods typically seek pair-wise or triplet labels to conduct the similarity preserving learning. However, complex semantic concepts of visual contents are hard to capture by similar/dissim...
متن کاملLarge-scale, sequence-discriminative, joint adaptive training for masking-based robust ASR
Recently, it was shown that the performance of supervised timefrequency masking based robust automatic speech recognition techniques can be improved by training them jointly with the acoustic model [1]. The system in [1], termed deep neural network based joint adaptive training, used fully-connected feedforward deep neural networks for estimating time-frequency masks and for acoustic modeling; ...
متن کاملLong Short-Term Memory for Speaker Generalization in Supervised Speech Separation
Speech separation can be formulated as learning to estimate a time-frequency mask from acoustic features extracted from noisy speech. For supervised speech separation, generalization to unseen noises and unseen speakers is a critical issue. Although deep neural networks (DNNs) have been successful in noise-independent speech separation, DNNs are limited in modeling a large number of speakers. T...
متن کاملSpeech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015